Corpus - based Method for Automatic Identi cation

نویسندگان

  • Gregory Grefenstette
  • Simone Teufel
چکیده

Nominalization is a highly productive phenomena in most languages. The process of nominaliza-tion ejects a verb from its syntactic role into a nominal position. The original verb is often replaced by a semantically emptied support verb (e.g., make a proposal). The choice of a support verb for a given nomi-nalization is unpredictable, causing a problem for language learners as well as for natural language processing systems. We present here a method of discovering support verbs from an untagged corpus via low-level syntactic processing and comparison of arguments attached to verbal forms and potential nom-inalized forms. The result of the process is a list of potential support verbs for the nominalized form of a given predicate.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Term Identi cation and Classi cation in Biology Texts

The rapid growth of collections in online academic databases has meant that there is increasing di culty for experts who want to access information in a timely and e cient way. We seek here to explore the application of information extraction methods to the identi cation and classi cation of terms in biological abstracts from MEDLINE. We explore the use of a statistical method and a decision tr...

متن کامل

Automatic Sublanguage Identi cation for a New Text

A number of theoretical studies have been devoted to the notion of sublanguage which mainly concerns linguistic phenomena restricted by the domain or context Furthermore there are some successful NLP systems which have explicitly or implicitly addressed the sublanguage restrictions e g TAUM METEO ATR This suggests the following two objectives for future NLP research automatic linguistic knowled...

متن کامل

Ambiguity reduction in speaker identification by the relaxation labeling process

A nonlinear probabilistic model of the relaxation labeling (RL) process is implemented in the speaker identi"cation task in order to disambiguate the labeling of the speech feature vectors. In this proposed algorithm, the deterministic labeling of the vector quantization (VQ)-based speaker identi"cation is relaxed by means of introducing initial probabilistic weights to the labeling process of ...

متن کامل

A New Direction

There have been a number of theoretical studies devoted to the notion of sublanguage Further more there are some successful natural language processing systems which have explicitly or im plicitly utilized sublanguage restrictions How ever two big problems are still unsolved to utilize the sublanguage notion automatic de nition and dynamic identi cation of a text to sublan guage and automatic l...

متن کامل

A Similarity Measure for Automatic Audio Classi cation

This paper presents recent results using statistics generated by a MMI-supervised vector quantizer as a measure of audio similarity. Such a measure has proved successful for talker identi cation, and the extension from speech to general audio, such as music, is straightforward. A classi er that distinguishes speech from music and non-vocal sounds is presented, as well as experimental results sh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995